Qualitative Data Cleaning
نویسندگان
چکیده
Data quality is one of the most important problems in data management, since dirty data often leads to inaccurate data analytics results and wrong business decisions. Data cleaning exercise often consist of two phases: error detection and error repairing. Error detection techniques can either be quantitative or qualitative; and error repairing is performed by applying data transformation scripts or by involving human experts, and sometimes both. In this tutorial, we discuss the main facets and directions in designing qualitative data cleaning techniques. We present a taxonomy of current qualitative error detection techniques, as well as a taxonomy of current data repairing techniques. We will also discuss proposals for tackling the challenges for cleaning “big data” in terms of scale and distribution.
منابع مشابه
Chemical cleaning of potable water membranes: a review
The literature on chemical cleaning of polymeric hollow fibre ultrafiltration and microfiltration membranes used in the filtration of water for municipal water supply is reviewed. The review considers the chemical cleaning mechanism, and the perceived link between this and membrane fouling by natural organic matter (NOM) – the principal foulant in municipal potable water applications. Existing ...
متن کاملQUALITATIVE INTERVIEWING IN INTERNET STUDIES Playing with the media, playing with the method
This methodological paper addresses practical strategies, implications, benefits and drawbacks of collecting qualitative semi-structured interview data about Internet-based research topics using four different interaction systems: face to face; telephone; email; and instant messaging. The discussion presented here is based on a review of the literature and reflection on the experiences of the a...
متن کاملThe importance of cleaning for the overall results of processing endoscopes.
Reprocessing comprises three steps: cleaning, disinfection and-if required-sterilisation. While the extents of disinfection and of sterilisation are quantitatively defined, there are only imprecise (qualitative) definitions of cleaning. There are two main reasons for accurate cleaning. First organic and inorganic materials that remain on inner and outer surfaces will interfere with the efficacy...
متن کاملCharacterization of occupational exposures to cleaning products used for common cleaning tasks-a pilot study of hospital cleaners
BACKGROUND In recent years, cleaning has been identified as an occupational risk because of an increased incidence of reported respiratory effects, such as asthma and asthma-like symptoms among cleaning workers. Due to the lack of systematic occupational hygiene analyses and workplace exposure data, it is not clear which cleaning-related exposures induce or aggravate asthma and other respirator...
متن کاملComparative Analysis of Different Imputation Methods to Treat Missing Values in Data Mining Environment
Data cleaning is one of the important step of KDD (Knowledge discovery in database) process. One critical problem in data cleaning is the presence of missing values. Various approaches have proposed to find & replace such missing data including use of mean value, use of global constant, replace by more probable value etc. Imputation is one of the important procedures in statistics that is used ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 9 شماره
صفحات -
تاریخ انتشار 2016